Analysing Gene Expression of Breast Cancer patients

Iben Sommerand s203522
Jonas Sennek s203516
Emilie Wenner s193602
Torbjørn Bak Regueira s203555
Vedis Arntzen s203546

Introduction

  • 2 296 840 new breast cancer patients in 20221.

  • Aim of project: Exploring and analyzing patterns in breast cancer gene expression data.

Materials and method

  • The analysis was performed on the dataset “GDC TCGA Breast Cancer (BRCA)” from xenabrowser.net

  • Our data:

    • Gene expression (RNAseq) and phenotype metadata

Notes: Materials: What data did you use and where did you get it from?

Methods: Data preparation

  • Data obtained programatically

  • Pivoted the gene expression dataset longer to be more tidy

  • The two datasets were joined on the patient IDs

  • Mutated the dataset to add new columns:

    • Age groups
    • Converted days to years for several relevant columns
  • Analytical methods:

    • Descriptive data analysis, PCA and Linear Modelling

Notes: Methods: Which modelling did you use? Think of the methods section as a recipe for how to go from raw to results => Flow chart?

Methods:

Show flowchart here!!!

Descriptive analysis 1: Overview of the data

Figure 2: Gender and ethnicity distribution within the data

Figure 3: Cancer stage distribution within the data

Descriptive analysis 2: Different effects on vitality

Figure 4: Vitality based on cancer type

Figure 5: Vitality by age

Figure 6: Survival time by prior malignancy

Analysis: Linear model

Show Jonas plot here

Analysis: Investigating cancer stages

{#{r, echo=FALSE, fig.cap='Figure 2: Survival time by Cancer Stage'} #knitr::include_graphics(path = here("results/05_01_years_until_death_by_cancer_stage.png"))

{#{r, echo=FALSE, fig.cap='Figure 3: Vital Status by Cancer Stage'} #knitr::include_graphics(path = here("results/05_02_vital_status_by_cancer_stage.png"))

Analysis: PCA

{#{r, echo=FALSE, fig.cap='Figure 3: Principal Component Analysis'} #knitr::include_graphics(path = here("results/06_pca_plot_2.png"))

Discussion:

  • Catching the cancer in an early stage seems to increase chance of survival

  • Limitations and future work

    • Compare against healthy tissue samples (eg. GTEX)